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Abstract: 

A reduced protein model with five to six atoms per amino acid and five amino acid 
types is developed and tested on a three-helix-bundle protein, a 46-amino acid frag- 
ment from staphylococcal protein A. The model does not rely on the widely used Go 
approximation where non-native interactions are ignored. We find that the collapse 
transition is considerably more abrupt for the protein A sequence than for random 
sequences with the same composition. The chain collapse is found to be at least as 
fast as helix formation. Energy minimization restricted to the thermodynamically 
favored topology gives a structure that has a root-mean-square deviation of 1.8 A 
from the native structure. The sequence- dependent part of our potential is pair- 
wise additive. Our calculations suggest that fine-tuning this potential by parameter 
optimization is of limited use. 

*E-mail: favrin, anders, stefan@thep.lu.se 
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1 Introduction 



In recent years, several important insights have been gained into the physical princi- 
ples of protein folding Still, in terms of quantitative predictions, it is clear that 
it would be extremely useful to be able to perform more realistic folding simulations 
than what is currently possible. In fact, most models that have been used so far for 
statistical-mechanical simulations of folding rely on one or both of two quite drastic 
approximations, the lattice and Go [0 approximations. 



The reason that lattice models have been used to study basics of protein folding is 
partly computational, but also physical — on the lattice, it is known what potential 
to use in order for stable and fast-folding sequences to exist (a simple contact poten- 
tial is sufficient). How to satisfy these criteria for off-lattice chains is, by contrast, 
largely unknown, and therefore many current off-lattice models |l5|,^|-p!^ use Go-type 
potentials |0] where non-native interactions are ignored. The use of the Go approxi- 
mation has some support from the finding that the native structure is a determinant 
for folding kinetics |15|,IT6[ . However, it is an uncontrolled approximation, and it is, of 



course, useless when it comes to structure prediction, as it requires prior knowledge 
of the native structure. 



In this paper, we discuss an off-lattice model that does not follow the Go prescription. 
Using this model, we perform extensive folding simulations for a small helical protein. 
The force field of the model is simple and based on hydrogen bonds and effective 
hydrophobicity forces (no explicit water). There exist other non Go-like models with 
more elaborate force fields that have been used for structure prediction with some 
success |T^-|T^. However, it is unclear what the dynamical properties of these models 
are. 



The original version of our model was presented in Ref. and has three types of 
amino acids: hydrophobic, polar and glycine. This version was applied to a designed 
three-helix-bundle protein with 54 amino acids [^]. For a suitable relative strength of 
the hydrogen bonds and hydrophobicity forces, it was found that this sequence does 
form a stable three-helix bundle, except for a twofold topological degeneracy, and 
that its folding transition is first-order-like and coincides with the collapse transition 
(the parameter a of Ref. is zero). 



Here, we extend this model from three to five amino acid types, by taking alanine to 
be intermediate in hydrophobicity between the previous two hydrophobic and polar 
classes, and by introducing a special geometric representation for proline, which is 



2 



needed to be able to mimic the helix-breaking property of this amino acid. Otherwise, 
the model is the same as before. The modified model is tested on a real three-helix- 
bundle protein, the 10-55-amino acid fragment of the B domain of staphylococcal 
protein A. The structure of this protein has been determined by NMR |21|, and an 
energy-based structure prediction method has been tested on the sequence [jl^. The 
folding properties have been studied too, both experimentally ||2^,|23| and theoreti- 
cally |,|lO|,|Tl|,||^. In particular, this means that we can compare the behavior of 



previous Go-like models to that of our more realistic model. 



2 Materials and Methods 



2.1 Geometry 



Our model is an extension of that introduced in Ref. It uses three different 

amino acid representations: one for glycine, one for proline and one for the rest. The 
non-glycine, non-proline representation is illustrated in Fig. Qa, and is identical to 
that of hydrophobic and polar amino acids in the original model. The three backbone 
atoms N, Cq, and C are all included, whereas the side chain is represented by a single 
atom, a large C/3. The remaining two atoms, H and O, are used to define hydrogen 
bonds. The representation of glycine is the same except that C/3 is missing. 

The representation of proline is new compared to the original model. The side chain 
of proline is attached to the backbone not only at C^, but also at N. A well-known 
consequence of this is that proline can act as a helix breaker. For the model to be 
able to capture this important property, we introduce a special representation for 
proline, which is illustrated in Fig. |l]b. It differs from that in Fig. [^a in two ways: 
first, the Ramachandran angle is held constant, at —65°; and second, the H atom 
is replaced by a side-chain atom, C5. This more realistic representation of proline is 
needed when studying the protein A fragment which has one proline at each of the 
two turns. 



All amino acids except proline have the Ramachandran torsion angles and ip (see 
Fig. |l|a) as their degrees of freedom, whereas ip is the only degree of freedom for 
proline. All bond lengths, bond angles and peptide torsion angles (180°) are held 
fixed. Numerical values of the bond lengths and bond angles can be found in Ref. pO[] 
and Fig. 03. 
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Figure 1: (a) Schematic figure showing the common geometric representation for all 
amino acids except glycine and proline, (b) The representation of proline. The 
atom is assumed to lie in the plane of the N, Cq, and atoms. The N-C^ bond 
vector w is given hy w = —0.596^ + 0.910^, where the vectors u and v are defined 
in the figure. The numerical factors were obtained by an analysis of structures from 
the Protein Data Bank (PDB) |2^. 

The helix-breaking property of proline manifests itself clearly in the shape of the 
distribution for amino acids that are followed by a proline in the sequence (with the 
proline on their C side). Helical values of ip are suppressed for such amino acids. 
This is illustrated in Fig. ^a, where the peak on the left corresponds to a-helix. From 
Fig. it can be seen that the model shows a qualitatively similar behavior. 



2.2 Force Field 



Our energy function 



E = Eioc + Ess. + -E'hb + E, 



^col 



is composed of four terms. The first two terms E\oc and E^^ are local 0, ip and self- 
avoidance potentials, respectively (see Ref. pO[). The third term is the hydrogen- 
bond energy i?hb, which is given by 



E, 



hb 



ehbE 



5 ^ 



12 



o"hb 



10' 



V (ofjj , Pij ) 



cos^ aij cos^ Pij ttij, Pij > 90 
otherwise 



(2) 
(3) 
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Figure 2: (a) Distributions of the Ramachandran angle ip, based on PDB data. The 
full (dashed) line represents non-glycine, non-proline amino acids that are followed 
by a non-proline (proline) in the sequence, (b) The corresponding histograms for the 
model, as obtained by simulations of Gly-X-X (full line) and Gly-X-Pro (dashed line) 
at kT = 0.55, where X denotes polar amino acids (shown is the ip distribution for 
the middle of the three amino acids). 



where i and j represent H and O atoms, respectively, and where denotes the HO 
distance, aij the NHO angle, and Pij the HOC' angle. 



The last term in Eq. (|I]), the hydrophobicity or collapse energy Ecoi, has the form 



i<j 



Ccol 



12 



O^col 



(4) 



where the sum runs over all possible CpCp pairs and Sj denotes amino acid type. To 
define A(sj,Sj), we divide the amino acids into three classes: hydrophobic (H; Leu, 
He, Phe), alanine (A; Ala) and polar (P; Arg, Asn, Asp, Gin, Glu, His, Lys, Pro, 
Ser, Tyr).[] There are then six kinds of C^C/? pairs, and the corresponding A(sj, Sj) 
values are taken to be 



1 for HH and HA pairs 

for HP, AA, AP and PP pairs 



(5) 



The main change in the force field compared to Ref. is that alanine forms its own 
hydrophobicity class, besides the previous two hydrophobic and polar classes. Alanine 

t Cys, Met, Thr, Trp and Val do not occur in the sequence studied. 
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is taken as intermediate in hydrophobicity, meaning that there is a hydrophobic 
interaction between HA pairs but not between AA pairs. In addition, the interaction 
strength ecoi is increased shghtly, from 2.2 to 2.3.0 Finally, in the self-avoidance 
potential, the atom of proline is assigned the same size as atoms. Otherwise, 



the entire force field, including parameter values, is exactly the same as in Ref. 20 



With these changes in geometry and force field, we end up with five different amino 
acid types in the new model. First, we have hydrophobic, alanine and polar which 
share the same geometric representation but differ in hydrophobicity, and then glycine 
and proline with their special geometries. 



In this paper, we test this model on the 10-55-amino acid fragment of the B domain of 
staphylococcal protein A. Calculated structures are compared to the minimized aver- 
age NMR structure with PDB code Ibdd. Throughout the paper, this structure 
is referred to as the native structure. 



As a first test of our model, two different fits to the native structure were made. 
The first fit is purely geometrical. Here, we simply minimized the root-mean-square 
deviation (rmsd) from the native structure, 6 (calculated over all backbone atoms). 
This was done by using simulated annealing, and the best result was 5 = 0.14 A. In 
the second fit, we took into account the limitations imposed by the first three terms 
of the potential, by minimizing the function 

E = Eio, + + ^hb + E(ri - , (6) 

i 

where k = 1 and {r°} denotes the structure obtained from the first fit. The 
minimum-^ structure had 6 = 0.32 A. These results show that our model, in spite 
of relatively few degrees of freedom, permits a quite accurate description of the real 
structure. 



2.3 Numerical Methods 



To simulate the thermodynamic behavior of this model, we use simulated temper- 



ing p8|-p0|, which means that the temperature is a dynamical variable (for details, 
see Refs. P5|-p0[). The temperature update is a standard Metropolis step. Our con- 
formation updates are of two different types: the simple non-local pivot move where 



^The energy unit is dimensionless and such that kTc ~ 0.62, Tc being the collapse temperature 
(see Sec. ||). 
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Figure 3: Monte Carlo evolution of the energy in a simulated-tempering run. 



a single torsion angle is turned, and the semi-local biased Gaussian step proposed in 



Ref. I^Tf. The latter method works with the Ramachandran angles of four adjacent 
amino acids. These are turned with a bias toward local rearrangements of the chain. 
The degree of bias is governed by a parameter b. In our thermodynamic simulations, 
we take b = 10 (rad/A)^, which gives a strong bias toward deformations that are 
approximately local []3l| . 



Figure |^ shows the evolution of the energy in a simulated-tempering run that took 
about two weeks on an 800 MHz processor. Data corresponding to all the different 
temperatures are shown (eight temperatures, ranging from kT = 0.54 to kT = 0.90). 
We see that there are many independent visits to low-energy states, which is nec- 
essary in order to get a reliable estimate of the relative populations of the folded 
and unfolded states. To test the usefulness of the semi-local update, we repeated 
the same calculation using pivot moves only. The difference in performance was not 
quantified, but it was clear that the sampling of low energies was less efficient in the 
run relying solely on pivot moves. 



For our kinetic simulations, we do not use the pivot update but only the semi-local 
method. The parameter b is taken to be 1 (rad/A)^ in the kinetic runs, which turned 
out to give an average change in the end-to-end vector squared of about 0.5 A^. 
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Figure 4: The radius of gyration (in A) against temperature. Full and dashed lines 
represent the protein A sequence and the three random sequences (see the text), 
respectively. 



3 Results and Discussion 



3.1 Thermodynamics 

We begin our study of the model defined in Sec. ^ by locating the collapse transition. 
In Fig. ^, we show the radius of gyration (calculated over all backbone atoms) against 
temperature for both the protein A sequence and three random sequences with the 
same length and composition. The random sequences were generated keeping the 
two prolines of the protein A sequence fixed at their positions, one at each turn. The 
remaining 44 amino acids were randomly reshuffled. 

Naively, one may expect these sequences to show similar collapse behaviors, since 
the composition is the same. However, the protein A sequence turns out to collapse 
much more efflciently than the random sequences (see Fig. ^). The native structure 
has a radius of gyration of 9.25 A, which is significantly smaller than one finds for the 
random sequences in this temperature range. The specific heat (data not shown) has 
a pronounced peak in the region where the collapse occurs. Taking the maximum as 
the collapse temperature Tc, we obtain kTc = 0.62 for the protein A sequence. 

The chain collapse is not as abrupt for the protein A sequence as for the designed 
sequence studied in Ref. [^. This is not surprising, as that sequence has a hy- 
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Figure 5: (a) Free-energy profile F{Q) = —kT In P{Q) at kT = 0.54 (full line), where 
P{Q) is the probability distribution of Q. Also shown (dashed line) is the result 
for one of the random sequences at kT = 0.50. (b) Q, E scatter plot for quenched 
conformations with low energy. 



drophobicity pattern that fits its native structure perfectly. The protein A sequence 
does not have a fully perfect hydrophobicity pattern, but still the collapse behavior is 
highly cooperative, as can be seen from the comparison with the random sequences. 

Next, we turn to the structure of the collapsed state. As a measure of similarity with 
the native structure, we use 

Q = exp(-5Vl00 A^) , (7) 

where 6, as before, denotes rmsd. An alternative would be to base the similarity 
measure on the number of native contacts present, rather than rmsd. The problem 
with such a definition is that it does not provide an efficient discrimination between 



the two possible topologies of a three-helix bundle ||3^ — the third helix can be either 
in front of or behind the U formed by the first two helices. This problem is avoided 
by using rmsd. 



In Fig. we show the free-energy profile F{Q) in the collapsed phase at kT = 0.54. 
We see that there is a broad minimum at Q ~ 0.8-0.9, with two distinct local minima 
a.t Q = 0.78 and Q = 0.90, respectively. Both these minima correspond to the native 
overall topology. There is also a minimum at Q = 0.50, which corresponds to the 
wrong topology. The Q = 0.50 minimum is more narrow and slightly higher, so the 
native topology is the favored one. However, it should be stressed that it is difficult 
to discriminate between the two topologies using a pairwise additive potential (see 
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Sec. |3.4| ). To be able to do that in a proper way, it is likely that one has to include 
multibody terms and/or more side-chain atoms in the model. 



The main difference between the two minima at Q = 0.78 and Q = 0.90 lies in the 
shape and orientation of helix III, which comprises amino acids 41-55 in the native 
structure. At the Q = 0.78 minimum, there tends to be a sharp bend in this segment, 
and the amino acids before the bend, 41-44, are disordered rather than helical. The 
remaining amino acids, 45-55, tend to make a helix, but its orientation differs from 
that in the native structure. Relative to the Q = 0.90 minimum, where helix III is 
much more native-like, we find that the Q = 0.78 minimum is entropically favored 
but energetically disfavored. The separation in energy between these minima is prob- 
ably underestimated by our model. There is, for example, a stabilizing electrostatic 
interaction between helices I and III in the native structure (Glul6-Lys50), which 
should favor the Q = 0.90 minimum but is missing in our model. 

Also shown in Fig. ^ is the result for one of the random sequences. The probability 
of finding this sequence in the vicinity of the native structure is, not unexpectedly, 
very low. The same holds true for the other two random sequences too (data not 
shown) . 

To extract representative conformations for the collapsed state, we used simulated an- 
nealing followed by a conjugate- gradient minimization. Using this procedure, a large 
set of low-temperature Monte Carlo conformations were quenched to zero tempera- 
ture. In Fig. we show the quenched conformations with lowest energy in a Q,E 
scatter plot. Our minimum-energy structure is found aX Q = 0.44, corresponding to 
5 = 9.1 A. However, our thermodynamic calculations show that this conformation is 
not very relevant, in spite of its low energy. If we restrict ourselves to conformations 
with the native-like and thermodynamically most relevant topology, then the lowest 
energy is at Q = 0.97, corresponding to 6 = 1.8 A. This conformation is shown in 
Fig. ^ along with the native structure. It is worth noting that the Q = 0.44 and 
Q = 0.97 minima both were revisited in independent runs. 



These results can be compared with those of Scheraga and coworkers who tested 
an energy-based structure prediction method on the same sequence. With their 
energy function, the global minimum was found to have an rmsd of 3.8 A from the 
native structure (calculated over Cq, atoms). 
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Figure 6: Schematic illustrations of the native structure (left) and our minimum- 



energy structure for the native topology (right). Drawn with RasMol |33 

Segment Sequence Amino acids 

I QQNAFYEILHL 1(^20 

II NEEQRNGFIQSLKDD 24-38 

III QSANLLAEAKKLNDA 41-55 

Table 1: The one- helix fragments studied. 
3.2 Helix Stability 



Having discussed the overall thermodynamic behavior, we now take a closer look at 
the stability of the secondary structure and how it varies along the chain. To this 
end, we monitored the hydrogen-bond energy between the CO group of amino acid 
i and the NH group of amino acid i + 4 [see Eqs. (@,0)], ehb(0, as a function of i. 
This was done not only for the protein A sequence, but also for the corresponding 
three one-helix segments, which are listed in Table |I|. An experimental study ||23[ of 
essentially the same three segments found segment III to be the only one that shows 
some stability on its own. 

The results of our calculations are shown in Fig. |^, from which we see that the 
difference between the full sequence and the one-helix segments is not large in the 
model. However, the segments I and II definitely make less stable helices on their 
own than as interacting parts of the full system; they are stabilized by interhelical 
interactions. Furthermore, among the three one-helix segments, the model correctly 
predicts segment HI to be the most stable one. That this segment does not get more 
stable as part of the full system is probably related to the observation above that 
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Figure 7: Hydrogen-bond profile sliowing tlie normalized average energy of a-lielical 
liydrogen bonds, (ehb(0)/ehb, against amino acid number i, at kT = 0.58. Tlie 
full line represents the protein A sequence, whereas the dashed lines represent the 
corresponding three one-helix segments (see Table |l|). The thick horizontal lines 
indicate hydrogen bonds present in the native structure. 



helix III is distorted at the Q = 0.78 minimum. 

A striking detail in Fig. ^ is that the beginning of segment II is quite unstable. This 
can be easily understood. This segment has a flexible glycine at position 30, and 
the amino acids before the glycine, 24-29, are all polar, so there are no hydrophobic 
interactions that can help to stabilize this part. 



3.3 Kinetics 



Using the semi-local update 31 , we performed a set of 30 kinetic simulations at kT = 
0.54. The runs were started from random coils. There are big differences between 
these runs, partly because the system, as it should, sometimes spent a significant 
amount of time in the wrong topology. Nevertheless, the data show one stable and 
interesting trend, namely, that the formation of helices was never faster than the 
collapse. This is illustrated in Fig. |^, which shows the evolution of the similarity 
parameter Qq, the hydrogen-bond energy E'hb and the radius of gyration, Rg, in one 
of the runs. Qq is defined as Q in Eq. (|^), except that it measures similarity to the 
optimized model structure in Fig. ^ rather than the native structure. In Fig. |, we 
see that ii^hb converges slowly, whereas the collapse occurs relatively early. 
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Figure 8: Monte Carlo evolution of the similarity parameter Qq (top), the hydrogen- 
bond energy E]^\^ (middle) and the radius of gyration (bottom) in a kinetic simu- 
lation at kT = 0.54. 



Now, at a first glance, it may seem easy to make the helix formation faster by simply 
increasing the strength of the hydrogen bonds. Therefore, it is important to note 
that the hydrogen bonds cannot be made much stronger without making the ground 



state non-compact and thus destroying the three- helix bundle l^lj. This means that 
the conclusion that the collapse is at least as fast as helix formation holds for any 
reasonable choice of parameters in this model. 



It is interesting to compare these results to those of Zhou and Karplus ||I0|, who 
studied the same protein using a Go-type potential and observed fast folding when 
the Go forces were strong. Under these conditions, the helix formation was found to 
be fast, whereas the collapse was the rate-limiting step. 

However, a Go-like model ignores a large fraction of the interactions that drive the col- 
lapse, which can make the collapse artificially slow. In a recent Go model study , 



this problem was addressed by eliminating backbone terms from the potential until 
a reasonable helix stability was achieved. No such calibration was carried out in 
Ref . . This may explain why these authors find a behavior that our model cannot 



reproduce. 
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Let us finally mention that we also performed the same type of kinetic simulations 
for the designed sequence studied in Ref. ||2^ which, as discussed earlier, has a very 
abrupt collapse transition. It turns out that i?hb and Rg evolve in a strongly correlated 
manner in this case. So, the helix formation and collapse occur simultaneously for 
this sequence. 



3.4 Fine-tuning? 



In Sec. pA] , we discussed the relative weights of the two possible overall topologies, 
which is a delicate issue. What changes are needed in order for the model to more 
strongly suppress the wrong topology? Is it necessary to change the form of the 
energy function, or would it be sufficient to fine-tune the interaction matrix A(sj, Sj) 
in Eq. (|)? 

One way to do such a fine-tuning of A(sj, sj) would be to maximize (Q)', where Q 
is the similarity parameter and (■)' denotes a thermodynamic average restricted to 
compact conformations {Rg < 10 A say). This is essentially the overlap method of 
Ref. [^. The gradient of the quantity {Q)' can be written as 



- "-^{{Qxy-iQYixy), 



dA(si,Sj) kT 

where X is a sum of Lennard- Jones terms, (o"coi/rjj)^^ — 2{o'coi/rij)^, over all possible 
pairs of type Sj, Sj. 

We calculated the Q, X correlation in Eq. (H) for all pairs Si, sj with A(sj, Sj) = 1 at 
kT = 0.54, and found that \d{Q)' /dA{si, Sj)\ was small (< 0.15) for all these pairs. 
Hence, there is no sign that a significant increase in (Q)' can be achieved by fine- 
tuning A{si,Sj); the contact patterns seem to be too similar in the two topologies. 
To include more side-chain atoms and/or multibody terms in the model is likely to 
be a more fruitful approach. 



4 Conclusion 



We have explored a five-letter protein model with five to six atoms per amino acid, 
where the formation of native structure is driven by hydrogen bonding and effective 
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hydrophobicity forces. This model, which does not follow the Go prescription, was 
tested on a small but real sequence, a three-helix-bundle fragment from protein A. 

Using this model, the protein A sequence was found to collapse much more effi- 
ciently than random sequences with the same composition. In the collapsed phase, 
we found that the native topology dominates, although the suppression of the wrong 
three-helix-bundle topology is not strong. Energy minimization constrained to the 
thermodynamically favored topology gave a structure with an rmsd of 1.8 A from the 
native structure. 



In our kinetic simulations, the collapse was always at least as fast as helix forma- 
tion, which is in sharp contrast with previous results for the same protein that were 



obtained using a Go-like Cq, model [|T0|. A possible explanation for the conflicting 
conclusions is that the Go approximation makes the collapse artiflcially slow by ig- 
noring a large fraction of the interactions driving the collapse. In our model, the 
conclusion that the helix formation is not faster than collapse seems unavoidable; if 
one tries to speed up the helix formation by increasing the strength of the hydrogen 
bonds, then the chain does not fold into a compact helical bundle. 



The force fleld of our model was deliberately kept simple. In particular, the hy- 
drophobicity potential was taken to be pairwise additive, with a simple structure for 
the interaction matrix A(sj,Sj) [see Eq. @]. In the future, it would be very inter- 
esting to look into the behavior of the model in the presence of multibody terms. 
A simpler alternative is to stick to the pairwise additive potential and flne-tune the 
parameters A(sj, Sj). However, the calculations in this paper give no indication that 
there is much to be gained from such a flne-tuning. 
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